From the data summary, we can see that the average miles per gallon (mpg) is about 20.1, with a range from 10.4 to 33.9. Similarly, other variables like horsepower (hp), weight (wt), and the number of cylinders (cyl) show considerable variation.
data(mtcars)
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
Investigate the relationships between different variables.The correlation matrix reveals the strength and direction of linear relationships between variables. There is a strong negative correlation between horsepower (hp) and miles per gallon (mpg) (-0.78), suggesting that cars with higher horsepower tend to have lower fuel efficiency. Additionally, weight (wt) and mpg also show a strong negative correlation (-0.87), reinforcing the idea that heavier cars are less fuel-efficient.
cor(mtcars)
## mpg cyl disp hp drat wt
## mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.68117191 -0.8676594
## cyl -0.8521620 1.0000000 0.9020329 0.8324475 -0.69993811 0.7824958
## disp -0.8475514 0.9020329 1.0000000 0.7909486 -0.71021393 0.8879799
## hp -0.7761684 0.8324475 0.7909486 1.0000000 -0.44875912 0.6587479
## drat 0.6811719 -0.6999381 -0.7102139 -0.4487591 1.00000000 -0.7124406
## wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.71244065 1.0000000
## qsec 0.4186840 -0.5912421 -0.4336979 -0.7082234 0.09120476 -0.1747159
## vs 0.6640389 -0.8108118 -0.7104159 -0.7230967 0.44027846 -0.5549157
## am 0.5998324 -0.5226070 -0.5912270 -0.2432043 0.71271113 -0.6924953
## gear 0.4802848 -0.4926866 -0.5555692 -0.1257043 0.69961013 -0.5832870
## carb -0.5509251 0.5269883 0.3949769 0.7498125 -0.09078980 0.4276059
## qsec vs am gear carb
## mpg 0.41868403 0.6640389 0.59983243 0.4802848 -0.55092507
## cyl -0.59124207 -0.8108118 -0.52260705 -0.4926866 0.52698829
## disp -0.43369788 -0.7104159 -0.59122704 -0.5555692 0.39497686
## hp -0.70822339 -0.7230967 -0.24320426 -0.1257043 0.74981247
## drat 0.09120476 0.4402785 0.71271113 0.6996101 -0.09078980
## wt -0.17471588 -0.5549157 -0.69249526 -0.5832870 0.42760594
## qsec 1.00000000 0.7445354 -0.22986086 -0.2126822 -0.65624923
## vs 0.74453544 1.0000000 0.16834512 0.2060233 -0.56960714
## am -0.22986086 0.1683451 1.00000000 0.7940588 0.05753435
## gear -0.21268223 0.2060233 0.79405876 1.0000000 0.27407284
## carb -0.65624923 -0.5696071 0.05753435 0.2740728 1.00000000
The histogram of miles per gallon (mpg) shows that most cars have mpg values between 15 and 25. The distribution appears slightly right-skewed, indicating that there are a few cars with exceptionally high mpg.
p <- ggplot(mtcars, aes(x=mpg)) +
geom_histogram(binwidth=1, fill="#b44de0", color="black") +
theme_minimal() +
labs(title="Distribution of Miles per Gallon",
x="Miles per Gallon",
y="Count")
if (knitr::is_html_output()) {
ggplotly(p)
} else {
print(p)
}
The boxplot comparing mpg across different numbers of cylinders (cyl) shows that cars with fewer cylinders generally have higher mpg. Specifically, 4-cylinder cars have the highest median mpg, followed by 6-cylinder and then 8-cylinder cars. This indicates that cars with more cylinders tend to be less fuel-efficient.
p <- ggplot(mtcars, aes(x=factor(cyl), y=mpg)) +
geom_boxplot() +
theme_minimal() +
labs(title="Miles per Gallon by Number of Cylinders",
x="Number of Cylinders",
y="Miles per Gallon")
if (knitr::is_html_output()) {
ggplotly(p)
} else {
print(p)
}
The faceted scatter plots show the relationship between horsepower (hp) and miles per gallon (mpg) across different numbers of cylinders (cyl). Each facet represents a subset of the data for a specific cylinder count, revealing that the negative relationship between hp and mpg is consistent across all cylinder groups, but cars with more cylinders generally have lower mpg.
p <- ggplot(mtcars, aes(x=hp, y=mpg, color=factor(cyl))) +
geom_point(size = 3, alpha = 0.7) +
theme_minimal() +
labs(title="Miles per Gallon vs Horsepower by Cylinders",
x="Horsepower",
y="Miles per Gallon",
color = "Cylinders") +
scale_color_manual(values = c("4" = "#1f77b4", "6" = "#ff7f0e", "8" = "#2ca02c"))
if (knitr::is_html_output()) {
ggplotly(p)
} else {
print(p)
}
The pairwise scatter plots provide a comprehensive view of relationships between all pairs of variables. We can observe that both weight (wt) and displacement (disp) have strong negative relationships with mpg, while positively correlating with each other. This helps identify multicollinearity and understand how variables interact with one another.
pairs(mtcars)
The linear regression analysis models mpg as a function of horsepower (hp), weight (wt), and number of cylinders (cyl). The results show that all three variables significantly impact mpg, with weight and horsepower having the largest negative coefficients. This quantifies the earlier observations that heavier and more powerful cars are less fuel-efficient.
model <- lm(mpg ~ hp + wt + cyl, data=mtcars)
summary(model)
##
## Call:
## lm(formula = mpg ~ hp + wt + cyl, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9290 -1.5598 -0.5311 1.1850 5.8986
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.75179 1.78686 21.687 < 2e-16 ***
## hp -0.01804 0.01188 -1.519 0.140015
## wt -3.16697 0.74058 -4.276 0.000199 ***
## cyl -0.94162 0.55092 -1.709 0.098480 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.512 on 28 degrees of freedom
## Multiple R-squared: 0.8431, Adjusted R-squared: 0.8263
## F-statistic: 50.17 on 3 and 28 DF, p-value: 2.184e-11
The heatmap visualizes the correlation matrix, showing strong negative correlations between mpg and both horsepower (hp) and weight (wt), and strong positive correlations between horsepower (hp) and weight (wt).
cor_matrix <- cor(mtcars)
melted_cor_matrix <- melt(cor_matrix)
p <- ggplot(melted_cor_matrix, aes(x=Var1, y=Var2, fill=value)) +
geom_tile() +
scale_fill_gradient2(low="aquamarine", high="#C154C1", mid="white", midpoint=0, limit=c(-1,1)) +
theme_minimal() +
labs(title="Correlation Heatmap")
print(p)
The density plot for miles per gallon (mpg) shows a smooth distribution curve, indicating the probability density of different mpg values. The plot confirms that most cars have mpg values around 20, with a long tail on the right side, indicating a few highly fuel-efficient cars.
p <- ggplot(mtcars, aes(x=mpg)) +
geom_density(fill="blue", alpha=0.5) +
theme_minimal() +
labs(title="Density Plot of Miles per Gallon",
x="Miles per Gallon",
y="Density")
if (knitr::is_html_output()) {
ggplotly(p)
} else {
print(p)
}